Examination of the structural components of the Abilitator–A self-report questionnaire on work ability and functioning aimed at the population in a weak labour market position

Objectives According to the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) panel, structural validity describes how well Patient-Reported Outcome Measures’ (PROM) scores reflect the dimensions of the measured construct. The main purpose of this study was to examine the structural components of the Abilitator, a co-developed self-report questionnaire on work ability and functioning for the population in a weak labour market position. Methods We examined to what extent the Abilitator has reflective and formative elements in its five summary scales: “C. Inclusion”, “D. Mind”, “E. Everyday life”, “F. Skills”, and “G. Body”. The Abilitator data sample (n = 4555, men 51%, mean age 37 years) was collected in 2017–2022 by the Finnish Institute of Occupational Health in cooperation with the European Social Fund Priority 5 projects in which the participants have multiple challenges to gain employment. For the structural components and validity analysis we implemented both Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA). Results Based on the COSMIN criteria for structural validity, the Abilitator reached approximate model fit with CFA when we analysed the different concepts of the questionnaire separately rather than in one unified model. An exception was “E. Everyday life” which was a formative summary scale, and it did not reach approximate fit. EFA showed that the items in the Abilitator’s summary scales loaded on ten factors. Conclusions The Abilitator had both reflective and formative elements in its structure. It reached structural validity in those separate concepts that were based on a reflective model. This study revealed interesting connections between different aspects of the Abilitator and produced valuable information for further modification of the questionnaire.


Introduction
The population's health and quality of life would both improve if there were less unemployment [1].Minimizing long-term unemployment could also increase social inclusion and equality in working life and reduce marginalization [2][3][4].Therefore, measures to promote employment should be a societal priority [1].
However, many working-age people face persistent difficulties finding sustainable employment.This may be due to a migrant background, low-level education or skills, short work history, chronic health problems, disabilities, prolonged unemployment, living in a rural area, or being young or old [5][6][7].These people are in a weak labour market position and could benefit from individually designed measures to support their participation in employment [6,8].
In most Western European welfare states, the working-age population has access to a wide range of services to support their employment, education, health, and well-being.To assess which services might be the most suitable for each individual, we need validated, multi-professionally usable, quickly managed, and easily interpreted instruments.Alongside non-patientreported assessments carried out by clinicians or service professionals, the use of Patient-Reported Outcome Measures (PROMs) has become popular [9][10][11][12].PROMs are self-report questionnaires for individuals to assess their own situation in terms of health, health-related quality of life, and life situation [13][14][15][16][17][18].The work ability and functioning of the unemployed are important aspects to consider as they are known to have poorer health and work ability than those who are in employment [19][20][21].For example, in Finland, health and work ability problems are estimated to be an obstacle to employment for 45% of the unemployed population [5].
Work ability is a multidimensional concept that combines health, functioning, basic skills, and the occupationally relevant attributes required for executing work tasks in an acceptable work environment [22,23].It also contains elements related to the individuals' social networks and opportunities in their living environment [23].Functioning is integrally linked to health and includes psychological, social, physical, and cognitive dimensions [24,25].
At present only a few validated PROMs for the multidimensional assessment of work ability and functioning are aimed at the population in a weak labour market position [26][27][28].The Abilitator1 self-report instrument was developed in part to fill this gap.The Abilitator is a digital questionnaire that produces an individual report with written feedback and suggests further actions for maintaining or improving work ability and functioning.It is an indicative rather than a diagnostic instrument.It can be described as a "resource-oriented work ability mapping tool" for the general, non-observable work ability-related aspects that should be considered when building one's path towards employment [29].On the one hand, the Abilitator helps unemployed individuals identify their strengths and challenges in terms of their work ability and functioning.On the other hand, by producing valuable information it helps service professionals suggest the most effective means to support their clients' transition towards employment [29,30].
ESF projects, which is used for clustering the participants in the analyses of the present article.In addition, the data does not include any demographic information such as the participants' age, gender, or duration of unemployment.These particular pieces of information were considered to compromise the anonymity of the participants in the review process of the data sharing protocol conducted by the Finnish Institute of Occupational Health (FIOH).Therefore, the analyses of this paper cannot be fully replicated with the open data.To fully replicate the analyses of this study, one piece of the additional information is needed i.e., the variable for the identification of specific ESF projects, which is used for clustering the participants.This data can be requested from the FIOH's data protection officer via email: dpo@ttl.fiThe demographic information i.e., the participants' age, gender, or duration of unemployment are merely background information to describe the study population and play no role in the actual data analysis.In questions concerning data availability from FIOH, more information is available: https:// www.ttl.fi/en/privacy-notice/data-permits-fordatasets-of-the-finnish-institute-of-occupationalhealthData can be requested and other information concerning the data enquired from the FIOH's data protection officer via email: dpo@ttl.fi.
A PROM can be considered of high-quality when it is valid, reliable, responsive, and interpretable [31].Previous studies on the Abilitator's content validity [29], concurrent validity [32], and intrarater test-retest reliability [33] have supported the view that the Abilitator is a high-quality self-report questionnaire.However, further evidence of its psychometric properties is required.By examining the internal structure of the Abilitator we can obtain further information on how the different concepts and items in the instrument are related to each other.
The Abilitator's conceptual model on work ability, i.e., the theoretical model of how different constructs within its concepts are related [31] is based on the multi-dimensional theory of the Work Ability House [23].This theory constructs work ability as the dynamic balance between individual-related resources such as health, functioning, motivation and skills, and the operational environment such as social networks and service structure [23,34].The model also covers elements related to work, working conditions, colleagues, and leadership.However, the Abilitator does not include these work-related aspects, because the majority of individuals in a weak labour market position are not in employment.Within work ability, the conceptual model of functioning is viewed as biopsychosocial [24].
In addition to being a self-report tool for work ability and functioning, the Abilitator was developed for a variety of other purposes, including setting individual goals in different areas of life, and tracking their achievement.To meet these needs, the Abilitator combines both existing questionnaires and some new questions into one self-report questionnaire [29].The Abilitator's conceptual framework i.e., the model representing the relationships between the items and the construct to be measured [31] has both reflective and formative elements.This means that the underlying model for some of the constructs in the Abilitator are reflected by all its items, and other constructs are formed because of the items they include.The items in the formative constructs do not always correlate with each other [31,35].For example, many different situations and circumstances might affect concepts such as coping in everyday life, but they do not all need to exist at the same time.
A measurement theory describes how the scores produced by the items represent the construct to be measured [31].This theory is especially important with multi-item instruments, which contain several indirectly measured unobservable items [31].The Abilitator's underlying measurement theory is the Classical Test Theory (CTT) [36].This means that in the constructs that are based on a reflective model, the information is obtained by measuring the items that display the construct.However, no well-developed theories are available for the constructs that are based on a formative model.These constructs are based on common sense [31].
According to the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) panel's taxonomy, construct validity is one of the main elements when assessing a PROM's measurement properties [37].Construct validity is defined as the degree to which the scores of a measurement tool are consistent with the presumed construct in terms of the internal relationships, with the scores of other tools, or with the differences between the groups [31,37].It is assessed when no gold standard is available [31] and when abstract variables such as social inclusion or coping with everyday life are observed [38].
Structural validity is a subtype of construct validity [31].It describes how well the PROM's scores reflect the dimensions of the measured construct [37].It has been proposed that structural validity is not pertinent for all types of PROMs [35].When a self-report tool is based on a reflective model and has effect indicators, i.e., items that are highly correlated, interchangeable and all indicate the same underlying construct, the assessment of structural validity is important.In contrast, structural validity is not relevant when a PROM is based on a formative model in which the items form the construct and are not necessarily correlated [35].
As the Abilitator has both formative and reflective elements in its conceptual framework, a structural validity assessment is not essential.However, we wanted to determine how reflective the Abilitator's summary scale elements are.It has also been recommended that when an instrument already exists and has a mixed conceptual framework, it should be considered a reflective model [35].
The overall aim of this study was to examine the internal structure and structural validity of the Abilitator.This aim was achieved.

The Abilitator self-report questionnaire
The Abilitator was co-developed in 2014-2017 at the Finnish Institute of Occupational Health (FIOH) in "The Social Inclusion and the Change of One's Work Ability and Capacity" (Solmu), a national coordination project funded by the European Social Fund (ESF) Priority 5 programme (2014-2023) [29,32,33,39].One of the main aims of the ESF Priority 5 programme was to improve the social inclusion, work ability, and functioning of those in a weak labour market position [40].Solmu's target was to jointly develop a PROM 'The Abilitator', which would evaluate both the work ability and functioning of those participating in the national Priority 5 projects and detect any changes after the projects [29].In addition to the ESF projects in Finland, the Abilitator has been used in research [8,41,42] and in employment, health, and social services to support primary-level decision-making when large numbers of clients meet professionals with various occupational backgrounds [43,44].
The Abilitator contains 84 items under nine domains (sections): "A.Personal details", "B.Well-being", "C.Inclusion", "D.Mind", "E.Everyday life", "F.Skills", "G.Body", "H. Background information", and "I.Work and the Future" [29,32,33].Each section contains 4-14 items (S1 Appendix).The measure of each of the five reported sections C, D, E, F, and G is a summary scale, transformed into a score of 0% to 100% of the selected items (Fig 1 , Table 1).The smallest detectable changes in the Abilitator's summary scale scores are presented in Wikstro ¨m et al. [33].The qualitative data is gathered from all the questions in sections A, H, I, and in separate questions of C9-C13, F4, G2, G3, G8, G9-G12.These questions are valuable for the respondents to reflect on, but they also provide useful additional information on the respondent's situation for the service professionals.
In this study, we examined the structural components of the summary scales formed from sections C, D, E, F, and G, which contain the dimensions of social functioning and participation, psychological functioning, managing in everyday life, cognitive functioning and basic skills, and physical functioning, respectively.The "Overall situation" score, which is the mean of the sums obtained from these five summary scales, was not included in the analysis.There are no limit values for this score and the obtained data is used by service professionals to monitor the overall change in the respondent's work ability and functioning, for example, when the Abilitator is applied for the second time after an intervention.Also, sections A, H, and I were excluded from the analysis since they were developed to provide background information and do not form a summary scale.Even though the Abilitator is available in 10 different languages, in this study we only examined the original Finnish version.

Study sample
The entire Abilitator data (n = 54464) has been collected by FIOH in various services and projects aimed mainly for the unemployed.To ensure that the participants in this study were in a weak labour market position, the sample utilized in this study (n = 4555) was collected in cooperation with the European Social Fund Priority 5 projects in 01.10.2017-31.12.2022.Each project (n = 150) gave written consent to FIOH to use their Abilitator data in an anonymous format in its research.All the Abilitator respondents in the data were participants in the ESF Priority 5 projects, which aim to improve the work ability and functioning of people not in employment to help them proceed on employment paths and to strengthen social inclusion [40].In each project the participants took part in activities such as cooking, gardening, social interaction, light physical activity, and rehabilitative work tasks.The participants could choose whether to answer the questionnaire online via the Abilitator online service or in a paper format.In the latter case, the staff of each ESF project transferred the answers to the Abilitator online service with the permission of the respondent.

Research design
The structural examination of the Abilitator followed the guidelines of the COSMIN panel [31,35].These guidelines recommend the implementation of factor analysis (FA) to determine the dimensionality of the data based on item correlations [31].In FA, the items that are highly correlated with each other can be grouped in one factor.The purpose of FA is to find the number of meaningful factors in data or in a construct.Of the FA types, the use of exploratory factor analysis (EFA) is suggested if the number of the dimensions in the instrument is not known beforehand.The use of confirmatory factor analysis (CFA) is recommended if the dimensions of the instrument are predetermined [31].Because the latter was the case with the Abilitator, we carried out CFA to determine whether the collected data fits the assumed factor structure.In addition to this strictly confirmatory approach, an alternative model approach was taken to examine other models that would fit the Abilitator's factor structure well [45].After CFA it was also relevant to conduct EFA to discover how the Abilitator's items loaded freely.We used Mplus software version 8 to conduct the CFA and EFA [46].The same data sample was used for all the analyses, because EFA was not used to develop a reflective, unidimensional, data driven model, which should be confirmed in a separate sample.

Data analysis
In the CFA, we followed three phases: 1) preparation, 2) model testing and 3) reporting the results [47].In the first phase, we prepared the data for the actual CFA analysis.The Abilitator data we used (n = 4555) consisted solely of the data collected from the ESF Priority 5 projects that had two or more participants.This included the individuals who had responded to the Abilitator in Finnish and were between 18 and 74 years old i.e., belonged to the labour force in Finland.The respondents who regularly used an aid when moving around were excluded.We described the study population by age, gender, self-rated health, perceived general functioning, perceived work ability and duration of unemployment.After these preparations, we screened the data by clarifying the sample size and its variation.The recommended sample size for CFA is 3 to 15 times the number of variables [48].To ensure sufficient variation, the data set should be over 400 cases, especially when the maximum likelihood with robust standard errors (MLR) method is used, with continuous non-normal data containing missing values [49].We also searched the data for and reported missing response percentages and checked for multivariate outliers.In addition, we analysed the data set for possible floor and ceiling effects.These effects may occur if more than 15% of all responses score at the lower or upper end of the scales [31].Both CFA and later EFA were conducted with multilevel modeling which considers the clustering of participants in specific ESF projects (TYPE = COMPLEX).Missing values were handled with the Missing data at random (MAR) option in Mplus.[46].For EFA the type of rotation used was GEOMIN i.e., oblique rotation where the dimensions of the Abilitator were allowed to correlate [45,46].
In the second phase of CFA, we conducted the model testing, which was further divided into five sub-phases: 1) specification, 2) identification, 3) estimation, 4) evaluation, and 5) modification [47].In the specification phase, we selected the first model to be tested on the basis of the hypothesis that the summary scales of the Abilitator comprise 5 concepts and 44 selected items.We further selected the second set of models on the basis of the assumption that the Abilitator consists of five separate concepts with 3-12 items within each concept.
In the identification phase, we identified the tested models by fixing one loading in each concept (factor) to one as a reference variable and the rest to zero.Only those models that were identified could be estimated.If the model's degrees of freedom (df) were over 0 the model was identified and if 0 it was just identified.If the model's df was under 0 it was not identified.
In the estimation phase, we estimated the factor loadings and residual error terms for all the models using the MLR method (ESTIMATOR = MLR).In the evaluation phase, we analysed how well the selected models matched the data using five indicators: the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error of approximation (RMSEA), the Standardized Root Mean Square Residual (SRMR) and the Chi-square (χ 2 ) test.Of these, CFI and TLI are incremental fit indices and utilize the worst possible model (null model) as a basis for assessing how well the model fits [45].The other extreme is the best possible model (saturated model).RMSEA, SRMR and χ 2 are absolute fit indices for the whole model and are not compared with the null model [45].The COSMIN cut-off limits for good model fit and measurement properties in CFA are CFI �0.95, TLI �0.95, RMSEA �0.06, SRMR �0.08, and χ 2 should be p �0.05 [35].However, if the measurement instrument is based on CTT, only one of these listed limits must be met to reach good measurement properties.In addition, the result of the χ 2 test is not required [35].
In the modification phase, we further improved the tested models by utilizing the result of the EFA analysis, in which the data was allowed to load freely, and with the modification index (MI).MI is related to assessing the fixed parameters and is a tool for determining potential sources of model misfit [45].With the knowledge of the highest MIs between the variables, we allowed some of the error terms to correlate to improve the model fit.For the EFA, factor loadings of �0.30 were considered substantial [31].
In the third phase of the CFA analysis, we reported the results for all the tested models with standardized values for factor loadings and error terms, correlations between concepts, and MIs in both graphic form and tables.The strength of the standardized factor loadings was considered sufficiently high if �0.5, but ideal if �0.7 [50].For CFA we used STYDYX standardization which is recommended for continuous variables [46].

Ethics statement
The study was approved by the ethics board of the Finnish Institute of Occupational Health in June 2017.All the respondents had been given written information about the Abilitator's data security and the possible later research use of the data in anonymous format.They had also all consented to this by voluntarily responding to the Abilitator self-report questionnaire.

Study population characteristics
The study population (N = 4555, 51% men, 47% women and 2% other, mean age 37 years) was made up of participants of 150 different ESF Priority 5 projects (Table 2).Most had problems with their health, general functioning, and work ability.Eighteen per cent of the participants had been unemployed for less than a year, 21% for one to two years, 34% for three to ten years and 9% for over 10 years.Nine per cent of the participants had never worked and another 9% were in employment at the time of the questionnaire.In terms of the Abilitator's score categories (Table 1), 23% of the participants had a poor or fairly poor situation in section "C.Inclusion", 42% in "D.Mind", 7% in "E.Everyday life", 37% in "F.Skills", and 37% in "G.Body", respectively.

Confirmatory and exploratory factor analysis
In terms of variation, the sample size was large enough for CFA to be conducted.The requirement was either over 400 or 3-15 times the number of the analysed 44 variables i.e., 132 to 660.The percentages of missing values were 0.92% for the whole sample, 0.76% for "C.Inclusion, 1.81% for "D.Mind", 0.73% for "E.Everyday life", 0.52% for "F.Skills", and 0.45% for "G.Body".No floor or ceiling effects were found, as less than 15% of the responses per analysed item or summary scale scored at the lowest or highest score of the scale.Therefore, the data was considered continuous and normally distributed.In addition, there were no outliers in the sample since all items could be scored from 1 to 5.
The results of the first tested model and the separate models with partial modifications are presented separately in graphic form (Figs 2-7).The ovals represent independent latent variables (concepts), and the circles are independent latent error terms.The rectangles are the dependent measured items.The two-way arrows represent the correlation or covariance between variables and one-way arrows the standardized, unidirectional factor loadings.The goodness of fit indices are presented beside each graph and in Table 3.The Table 3 also presents the goodness of fit indices for an alternative model of the "F.Skills" without the Abilitator's item E4 and the EFA with ten factors, which was the first well-fitting model.The Abilitator's factor loadings with EFA are presented in Table 4.Most of the items in the original summary scales loaded substantially on one to four different factors, but items in "D.Mind" on six factors.Items D3, D9 and E4 did not load substantially on any of the 10 factors.Items C6, C15, D5, E3, F6, F7 and F9 loaded substantially on two factors.

Discussion of the results
This study examined the structural components of the Abilitator, which is a self-report questionnaire on work ability and functioning aimed at the population in a weak labour market position.Our aim was to investigate to what extent the Abilitator has reflective elements in its five reported summary scales, and could structural validity be reached.The data used in the study was collected from the ESF Priority 5 projects, the aim of which is to support work ability, functioning and the overall well-being of those in a weak labour market position.The data corresponded to our prior expectations that the participants would belong to the target group; 64% were long-term unemployed, 63% had at least some health problems and 62% had problems functioning in everyday life.In addition, 72% perceived their work ability as fairly poor or poor.Therefore, the analyses in this study were conducted using the data sample from the intended population.
The models for CFA were specified based on two hypotheses: 1) All the concepts of the Abilitator can be analysed in the same model, and 2) All the concepts can be analysed as separate models.Both of these hypotheses were tested, and the results were compared to the COSMIN cut-off limits for the good model fit and measurement properties of CTT based instruments.The aim was to reach at least an approximate model fit i.e., that the analysed fit indices would fulfil at least acceptable or good criteria, but that in the χ 2 -test the p-value would not need to be over 0.05.It must also be noted that the χ 2 -test easily rejects models with samples larger than 500-600 [45].The first model that tested all concepts in one model had reasonably high factor loadings of over 0.5 or 0.7, apart from the variable E4 in the "E.Everyday life" and "F.Skills" concepts.There were also high positive correlations among all the other concepts, except for "G.Body".The SRMR was the only fit index to reach an acceptable value.This first model was too complex and did not have a sufficiently good fit with CFA.
The results for CFA were better with the second approach, in which we tested the Abilitator's concepts as separate models with partial modifications.All the tested models had high factor loadings of over 0.5 or 0.7, except for item E4 in the "E.Everyday life" and "F.Skills" concepts.The tested second-order factor model for the "C.Inclusion" concept had high factor loadings and reached an approximate fit.With concept "D.Mind" the first-order factor model with letting one pair of error terms to correlate based on the MIs also reached an approximate fit.Similarly, for the"F.Skills" concept, the second-order factor model also resulted in an approximate fit.For this concept we also tested an alternative model without item E4, which is usually counted in its summary scale.This new model reached a very good fit, which means that the concept "F.Skills" could also be used separately without item E4.
The concept "E.Everyday life" with the first-order factor model did not reach an approximate fit even after allowing three error terms to correlate on the basis of the MIs.On the one hand, the result indicates that this concept might need remodelling.On the other hand, this concept might also be formative in nature and therefore the CFA results can be ignored.The concept "C.Everyday life" is a combination of different aspects that are all related to coping with everyday activities, but they do not all need to be present at the same time for a person to find daily coping challenging.Therefore, we can assume that this concept might be based on a formative model.The model that tested the concept "G.Body" was saturated.This means that the model estimated as many parameters as there were pieces of information in the data and therefore the fit was perfect.In this case CFA did not produce a clear result.However, we were able to observe the factor loadings for this model and found that they were either acceptable or high.
The EFA showed that a well-fitting model of the Abilitator had ten different latent dimensions.We made some interesting observations when examining the factor loadings of the Abilitator's original summary scale items, i.e., which items were highly correlated and clustered in one factor.First, it was evident that some of the items in the two summary scales had similar contents.This was the case with item D1 "I've been feeling positive about the future" which clustered in the same factor with F5, F6 and F7 which also contained statements about the future, but from a slightly different perspective.The same was true for item C6 "I am in charge of the course of my life" which clustered with some parts of the original summary scale "C.Inclusion" but also with D4, D5, D7, and D8 which contained aspects of dealing with problems and life control.Second, it was interesting to notice that item F8 "I am able to verbally express myself in different situations" loaded on both the original summary scale "F.Skills" and on C15, C16, and C17, which contained aspects of socializing and friendships.The ability to express oneself verbally was connected to social relationships.Third, psychological well-being seems to be a comprehensive element of work ability and functioning in our study population, as the items in summary scale "D.Mind" clustered with many parts of "C.Inclusion" i.e., social inclusion, social relationships but also with some parts of skills "F.Skills" i.e., cognitive functioning and hopes for the future.This is supported by the literature as we know that unemployment impairs mental health and well-being [1,51].Mental disorders, neurological disabilities and substance abuse challenges have also been associated with low work ability in the unemployed [7,52].In contrast, good social relationships and experiencing life as meaningful support maintaining work ability in unemployment [53].
The result of EFA can also be viewed from the perspective of both the multidimensional model of work ability [22,23] and the biopsychosocial model of functioning [24,25], on which the Abilitator's underlying theory is based on [29].In our previous study [29] the items of the Abilitator were linked to the World Health Organization's International Classification of Functioning, Disability and Health (ICF), which provides a framework to describe and organize information on health-related states [24].ICF is based on the biopsychosocial model, which provides a holistic understanding of functioning as interactions between different perspectives of health conditions i.e., diseases, disorders, and injuries, and contextual factors i.e., external environmental factors such as climate or social attitudes, and internal personal factors such as age or education ICF further classifies three levels of functioning: 1) at the level of body or body part, 2) the whole person, and 3) the whole person in a social context.All these levels are interconnected and therefore each disability can be associated with impairments, activity limitations, and restrictions in participation in day-to-day life [54].In this study, from the Abilitator's ICF codes' point of view [29], the connection between the ability to express oneself verbally (d350 conversation) and socializing and friendships (e.g., d750 informal social relationships) can be seen as a logical result as verbal communication is usually an essential a part of social relationships.In addition, the clustering of psychological well-being (e.g., b152 emotional functions) with not just social inclusion (e.g., d7101 appreciation in relationships) and social relationships (e.g., d7500 informal relationships) but also with cognitive functioning (e.g., d160 focusing attention) and future hopes (b1265 optimism) supports the theory of ICF that a human being is a holistic entity [54].

Significance of the results
This study revealed that the Abilitator is more a combination of separate sets of questions on different aspects of work ability and functioning than one unified questionnaire.The "C. Inclusion", "D.Mind", and "F.Skills" concepts had acceptable structural validity [35].The concept "G.Body" could also be considered reflective, even though the tested model was saturated.The concept "E.Everyday life" was formative in nature and reaching structural validity was thus not relevant.From a practical point of view, this study supports the possibility to utilize the different sections of the Abilitator also separately.In addition, as the section "E.Everyday life" was formative, the responses to each item in this section should be viewed separately as the result of this summary scale does not necessarily reflect different aspects that might be challenging for the respondents in their daily life.
The results of this study support the basic idea of the development of the Abilitator in that the items in its concepts were based more on a theory and their usefulness in practice than on a data-driven approach, in which the items are selected to reflect a single well-defined latent trait [29].Therefore, each concept of the Abilitator can be interpreted as a sensible, meaningful combination of items of wide-ranging conceptual categories, derived from input from both the professionals and those in weak labour market position.This approach is also supported by Coulter [13], who recommends considering what really matters to the respondents when developing PROMs.
This study also revealed opportunities to further modify the internal structure of the Abilitator.For example, the questionnaire could be made shorter and therefore easier for the respondents to fill in and for the service professionals to interpret.Item reduction could be conducted to form a shorter PROM that includes only those items that reflected the Abilitator's dimensions the strongest.However, this further development would need to be carried out carefully, without jeopardizing the relevance of the questionnaire to the respondents and its reflection of their situation from different perspectives.An example of such is item "E4.Using the internet, searching for information", which did not reflect substantially either the section "E.Everyday life" or "F.Skills" even though the item counts as a part of both scales.However, difficulties with digital skills can hinder functioning in contemporary society, including in the labour market and employment.

Strengths and limitations
The main strength of this study was the carefully conducted examination of the structure of the Abilitator and the structural validation process, which was based on the widely accepted COSMIN methodology [31,35].Our study design followed each step of the recommended process for CTT based PROMs of which the underlying theory and the dimensions of the instrument were predetermined.We applied both CFA and EFA to examine the Abilitator's structure and analysed the results based on COSMIN's criteria for good measurement properties.At the end of the analysis, we viewed the COSMIN bias checklist for structural validity studies [35].We confirmed that the sample size was adequate i.e., seven times the number of items and n�100.We have described the study design and statistical methods used in sufficient detail.The COSMIN guidelines also encourage to conduct internal consistency analysis alongside the structural validity examination phase [35].We had already successfully implemented the internal consistency analysis with the same items and sections in our previous study on the Abilitator's test-retest reliability [33].However, the sample in that study was partly different containing also groups of unemployed individuals slightly closer to the labour market threshold.In the future, we could conduct the internal consistency analysis again with the data sample used in the present study.
However, there are also limitations.The main limitation of this study related to the context of the target population.The study was conducted only with specific data sample collected in Finland and with individuals responding to the Finnish version of the Abilitator.Therefore, the results of this study apply only to the Finnish context.Moreover, the population in the weak labour market position is very heterogenous.In the future, we could repeat the procedures of this study with other subgroups of the unemployed and those responding to the Abilitator in other languages.In addition, those who regularly use an aid when moving around could be studied as a separate subgroup.However, the sample size has not yet been large enough to conduct these subgroup analyses to reach the COSMIN criteria.

Conclusions
For a co-developed multidimensional self-report questionnaire such as the Abilitator, it might be difficult to reach approximate model fit in one unified model, but this is possible when the questionnaire's different concepts are analysed separately.As hypothesized, the Abilitator had both reflective and formative elements.Structural validity was reached in the separate concepts that were based on a reflective model.If the Abilitator is revised, shortening the questionnaire could be beneficial for its quicker implementation and interpretation.This study provided valuable information for the possible item reduction to be conducted in the future.